Using Corpus Statistics and WordNet Relations for Sense Identification
نویسندگان
چکیده
Corpus-based approaches to word sense identification have flexibility and generality but suffer from a knowledge acquisition bottleneck. We show how knowledge-based techniques can be used to open the bottleneck by automatically locating training corpora. We describe a statistical classifier that combines topical context with local cues to ident~y a word sense. The classifier is used to disambiguate a noun, a verb, and an adjective. A knowledge base in the form of WordNet's lexical relations is used to automatically locate training examples in a general text corpus. Test results are compared with those from manually tagged training examples.
منابع مشابه
UCD-FC: Deducing semantic relations using WordNet senses that occur frequently in a database of noun-noun compounds
This paper describes a system for classifying semantic relations among nominals, as in SemEval task 4. This system uses a corpus of 2,500 compounds annotated with WordNet senses and covering 139 different semantic relations. Given a set of nominal pairs for training, as provided in the SemEval task 4 training data, this system constructs for each training pair a set of features made up of relat...
متن کاملLearning Noun-Modifier Semantic Relations with Corpus-based and WordNet-based Features
We study the performance of two representations of word meaning in learning noun-modifier semantic relations. One representation is based on lexical resources, in particular WordNet, the other – on a corpus. We experimented with decision trees, instance-based learning and Support Vector Machines. All these methods work well in this learning task. We report high precision, recall and F-score, an...
متن کاملConstruction of Semantic Relations for Enhancing Word Sense Disambiguation in Question Answering Systems
Word sense disambiguation is a significant problem at the lexical level of natural language processing. The philosophy is to determine the meaning of a word in a particular usage, by using sense similarity and syntactic context with corpus evidence as well as semantic relations from WordNet. A training set will be constructed for each word tag (using the corpus). Each training example is repres...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملMediConceptNet: An Affinity Score Based Medical Concept Network
In healthcare, information extraction is essential in building automatic domain-specific applications. Medical concepts and their semantic identification take an important role to develop a network for visualizing medical concepts and their relations. The challenge appears while available medical corpora are only in either unstructured or semi-structured forms. In the present paper, to overcome...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Linguistics
دوره 24 شماره
صفحات -
تاریخ انتشار 1998